arXiv: 1508.05117v4 [cs.CC] 6 Oct 2016 


The Backtracking Survey Propagation Algorithm 
for Solving Random K-SAT Problems 


Raffaele Marino^, Giorgio Parisi^, and Federico Ricci-Tersenghi^ 

^NORDITA and AlbaNova University Centre, Dept, of Compntational Biology, KTH-Royal Institnte 
of Technology and Stockholm University, Roslagstnllsbacken 23, SE-10691 Stockholm, Sweden, 

email: rmarinoOkth. se 

^Dipartimento di Fisica, Sapienza Universita di Roma and Istitnto Nazionale di Fisica Nncleare, 
Sezione di Romal and CNR-Nanotec, Unita di Roma, P.le Aldo Moro 5, 1-00185, Roma, Italy, 
emails: giorgio. parisi@romal. infn. it, f ederico. ricciOromal. infn. it 


October 7, 2016 


Abstract 

Discrete combinatorial optimization plays a central role in many scientific disciplines, however for hard 
problems we lack linear time algorithms that would allow us to solve very large instances. Moreover it is still 
unclear what are the key features that make a discrete combinatorial optimization problem hard to solve. Here 
we study random A-satisfiability problems with A = 3,4 which are known to be very hard close to the SAT- 
UNSAT threshold, where problems stop having solutions. We show that the Backtracking Survey Propagation 
algorithm, in a time practically linear in the problem size, is able to find solutions very close to the threshold, in 
a region unreachable by any other algorithm. All solutions found have no frozen variables, thus supporting the 
conjecture that only unfrozen solutions can be found in linear time, and that a problem becomes impossibile to 
solve in linear time when all solutions contain frozen variables. 


Optimization problems with discrete variables are 
widespread among scientific disciplines and often 
among the hardest to solve. The A-satisfiability (A- 
SAT) problem is a combinatorial discrete optimiza¬ 
tion problem of N Boolean variables, x = {xi}i=i^N, 
submitted to M constraints. Each constraint, called 
clause, is in the form of an OR logical operator of A 
literals (variables and their negations): the problem is 
solvable when there exists at least one configuration of 
the variables, among the 2^ possible ones, that satis¬ 
fies all constraints. The A-SAT problem for A > 3 is a 
central problem in combinatorial optimization: it was 
among the first problems shown to be AP-complete 
m la [3] and is still very much studied. A growing 
collaboration between theoretical computer scientists 
and statistical physicists has focused on the random 
A-SAT ensemble where each formula is gener¬ 

ated by randomly choosing M = aN clauses of A lit¬ 
erals. Formulas from this ensemble become extremely 
hard to solve when the clause to variable ratio a grows 
[6]: nevertheless, even in this region, the locally tree¬ 
like structure of the factor graph [7j , representing the 
interaction network among variables, makes the ran¬ 
dom A-SAT ensemble a perfect candidate for analytic 


computations. The study of random A-SAT problems 
and of the related solving algorithms is likely to shed 
light on the origin of the computational complexity 
and to allow for the development of improved solving 
algorithms. 

Both numerical [8] and analytical mm evidence 
suggest that a threshold phenomenon takes place in 
random A-SAT ensembles: in the limit of very large 
formulas, A —)• oo, a typical formula has a solution for 
a < as{K), while it is unsatisfiable for a > as{K). It 
has been very recently proved in Ref. m that for A 
large enough the SAT-UNSAT threshold as{K) exists 
in the A —)• oo limit and coincides with the predic¬ 
tion from the cavity method of statistical physics [12] . 
A widely accepted conjecture is that the SAT-UNSAT 
threshold as{K) exists for any value of A. Finding 
solutions close to Os is very hard, and all known al¬ 
gorithms running in polynomial time fail to find solu¬ 
tions when a > Oa, for some Oa < Os- Actually, each 
algorithm ALG has it own algorithmic threshold , 
such that the probability of finding a solution vanishes 
for a > in the large A limit. For most algorithms 
cta^'^ is well below Og. We define Oa = maxALc 
the threshold beyond which no polynomial-time algo- 
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rithm can find solntions. There are two main open 
questions: to find improved algorithms having a larger 
and to understand what is the theoretical upper 
bound Oa- Here we present progress on both issues. 

The best prediction about the SAT-UNSAT thresh¬ 
old comes from the cavity method [lainiiiaiis]: for 
example, a;s(A = 3) = 4.2667 [H] and q;s(A = 4) = 9.931 
[T5] . Actually the statistical physics study of random 
A-SAT ensembles also provides us with a very detailed 
description of how the space of solutions changes when 
a spans the whole SAT phase (0 < a < ccs). Let us 
consider typical formulas in the large N limit and the 
vast majority of solutions in these formulas (i.e. typi¬ 
cal solutions), we know that, at low enough a values, 
the set of solutions is connected, so that they form a 
single cluster. In SAT problems we say 2 solutions are 
neighbors if they differ in the assignment of just one 
variable; in other problems (e.g. XOR-SAT [16]) this 
definition of neighbor needs to be relaxed, because a 
pair of solutions differing in just one variable are not 
allowed by the model definition. As long as the notion 
of neighborhood is relaxed to Hamming distances o{N) 
all the picture of the solution space based on statistical 
physics remains unaltered. 

As a increases, not only the number of solutions 
decreases, but at Od the random A-SAT ensemble un¬ 
dergoes a phase transition: the space of solutions shat¬ 
ters into an exponentially large (in the problem size N) 
number of clusters; two solutions belonging to different 
clusters have a Hamming distance 0{N). If we define 
the energy function A(x) as the number of unsatisfied 
clauses in configuration x, it has been found [T2| that 
for a > ad the energy A(x) has exponentially many (in 
N) local minima of positive energy, which may trap al¬ 
gorithms that look for solutions by energy relaxation 
(e.g. Monte Carlo simulated annealing). 

Further increasing a, each cluster loses solutions 
and shrinks, but the most relevant change is in the 
number of clusters. The cavity method allows us to 
count clusters of solutions as a function of the num¬ 
ber of solutions they contain m using this very de¬ 
tailed description several other phase transitions have 
been identified usmsi. For example, there is a value 
Oc where a condensation phase transition takes place, 
such that for a > Oc the vast majority of solutions 
belong to a sub-exponential number of clusters, lead¬ 
ing to effective long-range correlations among variables 
in typical solutions, which are hard to approximate 
by any algorithm with a finite horizon. In general 
ckd < CKc < tts holds. Most of the above picture of 
the solution space has been proven rigorously in the 
large A limit [III [20| . 

Moving to the algorithmic side, a very interesting 
question is whether such a rich structure of the solution 


space affects the performance of searching algorithms. 
While clustering at ad may have some impact on algo¬ 
rithms that sample solutions uniformly |21j . many al¬ 
gorithms exist that can find at least one solution with 
a > ad [I2l[22l|23|. 

A solid conjectnre is that the hardness of a formula is 
related to the existence of a snbset of highly correlated 
variables, which are very hard to assign correctly alto¬ 
gether; the worst case being a subset of variables that 
can have a unique assignment. This concept was in¬ 
troduced with the name of backbone in Ref. [23] . The 
same concept applied to solutions within a single clus¬ 
ter lead to the definition of frozen variables (within 
a cluster) as those variables taking the same value in 
all solutions of the cluster [25]. It has been proven in 
Ref. [26] that the fraction of frozen variables in a clus¬ 
ter is either zero or lower bounded by (ae^)“^/^^“^^; 
in the latter case the cluster is called frozen. 

According to the above conjecture, finding a solu¬ 
tion in a frozen cluster is hard (in practice it should 
require a time growing exponentially with N). So the 
smartest algorithm running in polynomial time should 
search for unfrozen clusters as long as they exist. Un- 
fortnnately counting unfrozen clusters is not an easy 
job, and indeed a large deviation analysis of their num¬ 
ber has been achieved only very recently |27) for a dif¬ 
ferent and simpler problem (bicoloring random regular 
hypergraphs). For random A-SAT only partial resnlts 
are known, that can be stated in terms of two thresh¬ 
olds: for a > Or (rigidity) typical solutions are in 
frozen cluster (but a minority of solutions may still 
be unfrozen), while for a > af (freezing) all solu¬ 
tions are frozen. It has been rigorously proven |28l [29] 
that Of < as holds strictly for A > 8. For small A, 
which is the interesting case for benchmarking solv¬ 
ing algorithms, we know a-^ = 9.883(15) for A = 4 
from the cavity method |15j . while for A = 3 the es¬ 
timate af = 4.254(9) comes from exhaustive enumer¬ 
ations in small formulas (A < 100) [30] and is likely 
to be affected by strong finite size effects. In general 
Od < CKr < Of < Os holds. 

The conjecture above implies that no polynomial 
time algorithm can solve problems with a > af, but 
also finding solutions close to the rigidity threshold a^ 
is expected to be very hard, given that unfrozen solu¬ 
tions becomes a tiny minority. And this is indeed what 
happens for all known algorithms. Since we are inter¬ 
ested in solving very large problems we only consider 
algorithms whose rnnning time scales almost linearly 
with N and we measure performance of each algorithm 
in terms of its algorithmic threshold 

Solving algorithms for random A-SAT problems can 
be roughly classified in two main categories: algo¬ 
rithms that search for a solution by performing a bi- 
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ased random walk in the space of configurations and 
algorithms that try to build the solutions by assign¬ 
ing variables, according to some estimated marginals. 
WalkSat [3l], focused Metropolis search (FMS) |22j 
and ASAT |23j belong to the former category; while in 
the latter category we find Belief Propagation guided 
Decimation (BPD) |21j and Survey Inspired Decima¬ 
tion (SID) [32]. All these algorithms are rather effec¬ 
tive in finding solutions to random AT-SAT problems: 
e.g. ioT K = A we have = 9.05, a™® 9.55 

and ~ 9.73 to be compared with a much lower 
algorithmic threshold = 5.54 achieved by Gen¬ 
eralized Unit Clause, the best algorithm whose range 
of convergence to a solution can be proven rigorously 
|33j . Among the efficient algorithms above, only BPD 
can be solved analytically [2T] to find the algorithmic 
threshold for the others we are forced to run 

extensive numerical simulations to measure . 

At present the algorithm achieving the best perfor¬ 
mance on several constraint satisfaction problems is 
SID, which has been successfully applied to the ran¬ 
dom AT-SAT problem m and to the coloring problem 
j34| . The statistical properties of the SID algorithm for 
AT = 3 have been studied in details in Refs. |35l[32|. Nu¬ 
merical experiments on random 3-SAT problems with 
a large number of variables, up to = 3 x 10®, show 
that in a time that is approximately linear in N the 
SID algorithm finds solutions up to a®™ ~ 4.2525 
[35] . that is definitely smaller, although very close to, 
as(A' = 3) = 4.2667. In the region a®™ < a < as the 
problem is satisfiable for large N, but at present no 
algorithm can find solutions there. 

To fill this gap we study a new algorithm for find¬ 
ing solutions to random TC-SAT problems, the Back¬ 
tracking Survey Propagation (BSP) algorithm. This 
algorithm (fully explained in the Methods section) is 
based, as SID, on the survey propagation (SP) equa¬ 
tions derived within the cavity method [El[35l[32] that 
provide an estimate on the total number of clusters 
■^cius = exp(S). The BSP algorithm, like SID, aims at 
assigning gradually the variables such as to keep the 
complexity S as large as possibile, i.e. trying not to 
kill too many clusters 13S]- While in SID each vari¬ 
able is assigned only once, in BSP we allow unsetting 
variables already assigned such as to backtrack on pre¬ 
vious non-optimal choices. In BSP the r parameter is 
the ratio between the number of backtracking moves 
(unsetting one variable) and the number of decimation 
moves (assigning one variable), r < 1 must hold and 
for r = 0 we recover the SID algorithm. The running 
time scales as A^/(l — r), with a slight overhead for 
maintaining the data structures, making the running 
time effectively linear in N for any r < 1. 

The idea supporting backtracking [SH] is that a 
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Figure 1: Fraction of random 4-SAT instances 
solved by BSP as a function of the constraints per 
variable ratio a. The average is computed over 100 
instances with N = 5 000 (solid symbols) and N = 
50 000 (empty symbols) variables. The vertical line is 
the best estimate for Or and the shaded region is the 
statistical error on this estimate. For each instance, the 
algorithm has been run once; on instances not solved 
on the first run, a second run rarely (< 1%) finds a 
solution. The plot shows that the backtracking (r > 0) 
definitely makes the BSP algorithm more efficient in 
finding solutions. Although data become sharper by 
increasing the problem size A", a good estimation of the 
algorithmic threshold from these datasets is unfeasible. 


choice made at the beginning of the decimation pro¬ 
cess, when most of the variables are unassigned, may 
turn to be suboptimal later on; if we re-assign a vari¬ 
able that is no longer consistent with the current best 
estimate of its marginal probability, we may get a bet¬ 
ter satisfying configuration. We do not expect the 
backtracking to be essential when correlations between 
variables are short ranged, but approaching as we 
know that correlations become long ranged and thus 
the assignment of a single variable may affect a huge 
number of other variables: this is the situation when 
we expect the backtracking to be crucial. 

This idea may look similar in spirit to the sur¬ 
vey propagation reinforcement (SPR) algorithm [37] . 
where variables are allowed to change their most likely 
value during the run, but in practice BSP works much 
better. In SPR, once reinforcement fields are large, 
the re-assignment of any variable becomes unfeasible, 
while in BSP variables can be re-assigned to better val¬ 
ues until the very end, and this is a major advantage. 
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Results 

Probability of finding a SAT assignment The 

standard way to study the performance of a solving 
algorithm is to measure the fraction of instances it can 
solve as a function of a. We show in Fig. such a frac¬ 
tion for BSP run with three values of the r parameter 
(r = 0, 0.5 and 0.9) on random 4-SAT problems of two 
different sizes {N = 5 000 and N = 50 000). The prob¬ 
ability of finding a solution increases both with r and 
A, but an extrapolation to the large A limit of these 
data is unlikely to provide a reliable estimation of the 
algorithmic threshold 

In each plot having a on the abscissa, the right end 
of the plot coincides with the best estimate of as, in 
order to provide an immediate indication of how close 
to the SAT-UNSAT threshold the algorithm can work. 

Order parameter and algorithmic threshold In 

order to obtain a reliable estimate of a®®^ we look for 
an order parameter vanishing at a®®^ and having very 
little finite size effects. We identify this order param¬ 
eter with the quantity Sres/Ares, where Sres and Ares 
are respectively the complexity (i.e. log of number of 
clusters) and the number of unassigned variables in the 
residual formula. As explained in Methods, BSP as¬ 
signs and re-assigns variables, thus modifying the for¬ 
mula, until the formula simplifies enough that the SP 
fixed point has only null messages: the residual formula 
is defined as the last formula with non-null SP fixed 
point messages. We have experimentally observed that 
the BSP algorithm (as the SID one [35]) can simplify 
the formula enough to reach the trivial SP fixed point 
only if the complexity S remains strictly positive dur¬ 
ing the whole decimation process. In other words, on 
every run where S becomes very close to zero or neg¬ 
ative, SP stops converging or a contradiction is found. 
This may happen either because the original problem 
was unsatisfiable or because the algorithm made some 
wrong assignments incompatible with the few avail¬ 
able solutions. Thanks to the above observation we 
have that Sres > 0 and thus a null value for the mean 
residual complexity signals that the BSP algorithm is 
not able to find any solution, and thus provides a valid 
estimate for the algorithmic threshold a®®^. From the 
statistical physics solution to random A-SAT problems 
we expect Sres to vanish linearly in a. 

As we see in panel (a) of Fig. the mean value 
of the intensive mean residual complexity Sres/Ares is 
practically size-independent and a linear fit provides 
a very good data interpolation: tiny finite size effects 
are visible in the largest A datasets only close to the 
dataset right end. The linear extrapolation predicts 
q,bsp py g_g k = a and r = 0.9), which is slightly 
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Figure 2: BSP algorithmic threshold on random 
4-SAT problems. The residual complexity per vari¬ 
able, Sres/Ares, goes to zero at the algorithmic thresh¬ 
old a®®^. (a) The very small finite size effects, mostly 
producing a slight downward curvature at the right 
end, allow for a very reliable estimate of a®®^ via a 
linear fit. For random 4-SAT problems solved by BSP 
with r = 0.9 we get a®®^ ~ 9.9, slightly beyond the 
rigidity threshold = 9.883(15), marked by a vertical 
line (the shaded area being its statistical error), (b) 
The same linear extrapolation holds for other values of 
r (red dotted line for r = 0.5 and blue dashed line for 
r = 0). The black line is the fit to r = 0.9 data shown 
in panel (a). SID without backtracking (r = 0) has a 
much lower algorithmic threshold, a®™ ~ 9.83. Error 
bars in both panels are the standard error on the mean 
(sem). 


above the rigidity threshold ar = 9.883(15) computed 
in Ref. [T5| and reported in the plot with a shaded re¬ 
gion corresponding to its statistical error (the value of 
Of in this case is not known, but a®®^ < af < as should 
hold). Although for the finite sizes studied no solution 
has been found beyond ar. Fig. [^suggests that in the 
large A limit BSP may be able to find solutions in a 
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region of a where the majority of solutions is in frozen 
clusters and thus very hard to find. We show below 
that BSP actually finds solutions in atypical unfrozen 
clusters, as it has been observed for some smart al¬ 
gorithms solving other kind of constraint satisfaction 
problems |38l [39] . 

The effectiveness of the backtracking can be appreci¬ 
ated in panel (b) of Fig. where the order parameter 
Sres/-Wes is shown for r = 0 and r = 0.5, together with 
linear hts to these datasets and to the r = 0.9 dataset 
(black line). We observe that the algorithmic thresh¬ 
old for BSP is much larger (on the scale measuring 
the relative distance from the SAT-UNSAT threshold) 
that the one for SID (i.e. r = 0 dataset). 

For random 3-SAT the algorithmic threshold of BSP, 
run with r = 0.9, practically coincide with the SAT- 
UNSAT threshold as (see Fig. [^, thus providing a 
strong evidence that BSP can find solutions in the en¬ 
tire SAT phase. The estimate for the freezing thresh¬ 
old af = 4.254(9) obtained in Ref. [30] from N < 100 
data is likely to be too small and affected by strong fi¬ 
nite size effects, given that all solutions found by BSP 
for N = 10® are unfrozen, even beyond the estimated 
af. Moreover we have estimated ar = 4.2635(10) 
improving the data of Ref. m and the inequality 
ctr < ctf < as makes the above estimate for af not 
very meaningful. 

Computational complexity As explained in 
Methods, the BSP algorithm performs /“^(l — r)~^ 
steps roughly, where at each step fN variables are ei¬ 
ther assigned [with prob. 1/(1 -|- r)] or released [with 
prob. r/(l -|-r)]. At the beginning of each step, the al¬ 
gorithm solves the SP equations with a mean number 
r] of iterations. The average r] is computed only on in¬ 
stances where SP always converges, as is usually done 
for incomplete algorithms (on the remaining problems 
the number of iterations reaches the upper limit set 
by the user, and then BSP exit, returning failure). 
Fig.j^shows that r] is actually a small number changing 
mildly with a and N both for K = 3 and K = 4. The 
main change that we observe is in the fluctuations of rj 
that become much larger approaching Ug. We expect 
r] to eventually grow as 0{log{N)), but for the sizes 
studied we do not observe such a growth. 

After convergence to a fixed point, the BSP algo¬ 
rithm just need to sort local marginals, thus the total 
number of elementary operations to solve an instance 
grows as /“^(l — r)~^{af7]N + a 2 NlogN), where oi 
and 02 are constants. Moreover, given that the sorting 
of local marginals does not need to be strict (i.e. a par¬ 
tial sorting m running in 0{N) time can be enough), 
we have that in practice the algorithm runs in a time 
almost linear in the problem size N. 

Whitening procedure Given that the BSP algo- 
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Figure 3: BSP algorithmic threshold on random 
3-SAT problems. Same as Fig. for AT = 3. The 
estimate for the freezing threshold af = 4.254(9) mea¬ 
sured on small problems in Ref. m is not very mean¬ 
ingful, given our new estimate for the rigidity threshold 
ar = 4.2635(10), and the observation that all solutions 
found by BSP are not frozen. Shaded areas are the 
statistical uncertainties on the thresholds. A linear fit 
to the residual complexity (brown line) extrapolates 
to zero slightly beyond the SAT-UNSAT threshold, at 
q/Bsp ~ 4.268, strongly suggesting BSP can find solu¬ 
tions in the entire SAT phase for A = 3 in the large 
N limit. The black line is a linear fit vanishing at ag. 
Error bars are sem. 
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Figure 4: BSP convergence time. The mean num¬ 
ber r] of iterations to reach a fixed point of SP equations 
grows very mildly with a and N, both for K = 3 (a) 
and K = A (b). Error bars are standard deviations. 


rithm is able to find solutions even very close to the 
rigidity threshold ar, it is natural to check whether 
these solutions have frozen variables or not. We con¬ 
centrate on solutions found for random 3-SAT prob¬ 
lems with N = 10®, since the large size of these prob- 
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lems makes the analysis very clean. 

On each solution found we run the whitening proce¬ 
dure (first introduced in [H] and deeply discussed in 
|42[ I26|i. that identifies frozen variables by assigning 
the joker state * to unfrozen (white) variables, i.e. vari¬ 
ables that can take more than one value without violat¬ 
ing any clause and thus keeping the formula satisfied. 
At each step of the whitening procedure, a variables is 
considered unfrozen (and thus assigned to *) if it be¬ 
longs only to clauses which either involve a * variable 
or are satisfied by another variable. The procedure is 
continued until all variables are * or a fixed point is 
reached: non-* variables at the fixed point correspond 
to frozen variables in the starting solution. 

We uncover that all solutions found by BSP are con¬ 
verted to all-* by running the whitening procedure, 
thus showing that solutions found by BSP have no 
frozen variables. This is somehow expected, according 
to the conjecture discussed in the Introduction: finding 
solutions in a frozen cluster would take an exponential 
time, and so the BSP algorithm actually finds solu¬ 
tions at very large a values by smartly focusing on the 
sub-dominant unfrozen clusters. 

The whitening procedure leads to a relaxation of the 
number of non-* variables as a function of the num¬ 
ber of iterations t that follows a two steps relaxation 
process [25] with an evident plateau, see panel (a) in 
Fig.i that becomes longer increasing a towards the al¬ 
gorithmic threshold. The time for leaving the plateau, 
scales as the time r(c) for reaching a fraction c on non- 

* variables (with c smaller than the plateau value). 
The latter has large fluctuations from solution to solu¬ 
tion, as shown in panel (b) of Fig. for c = 0.4 (very 
similar, but shifted, histograms are obtained for other 
c values). However, after leaving the plateau, the dy¬ 
namics of the whitening procedure is the same for each 
solution. Indeed plotting the mean fraction of non-* 
variables as a function of the time to reach the all- 

* configuration, r(0) — t, we see that fluctuations are 
strongly suppressed and the relaxation is the same for 
each solution (see panel (c) in Fig. [^. 

Critical exponent for the whitening time di¬ 
vergence In order to quantify the increase of the 
whitening time approaching the algorithmic threshold, 
and inspired by critical phenomena, we check for a 
power law divergence as a function of a) or Sres, 

which are linearly related. In Figwe plot in a double 
logarithmic scale the mean whitening time t(c) as a 
function of the residual complexity Bj-es, for different 
choices of the fraction c of non-* variables defining the 
whitening time. Data points are fitted via the power 
law r(c) = A(c)-|-B(c)B“g, where the critical exponent 
V is the same for all the c values. Joint interpolations 
return the following best estimates for the critical ex- 
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Figure 5: Whitening random 3-SAT solntions. 

(a) The mean fraction of non-* variables during the 
whitening procedure applied to all solutions found by 
the BSP algorithm goes to zero, following a two steps 
process. The relaxation time grows increasing a to¬ 
wards the algorithmic threshold. The horizontal line 
is the SP prediction for the fraction of frozen variables 
in typical solutions at a = 4.262 and the comparison 
with the data shows that solutions found by BSP are 
atypical, (b) Histograms of the whitening times, de¬ 
fined as the number of iterations required to reach a 
fraction 0.4 of non-* variables. Increasing a both the 
mean value and the variance of the whitening times 
grow, (c) Averaging the fraction of non-* variables 
at fixed r(0) —t, i.e. fixing the time to the all-* fixed 
point, we get much smaller errors than in panel (a), 
suggesting that the whitening procedure is practically 
solution-independent once the plateau is left. In all the 
panels, error bars are sem. 
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Figure 6: Critical exponent for the whitening 
time divergence. The whitening time t(c), defined 
as the mean time needed to reach a fraction c of non-* 
variables in the whitening procedure, is plotted in a 
double logarithmic scale as a function of Sres for ran¬ 
dom 3-SAT problems with N = 10® (upper dataset) 
and random 4-SAT problems with N = 5 x (lower 
dataset). The whitening time measured with different 
c values seems to diverge at the algorithmic thresh¬ 
old, where the residual complexity S^es vanishes. The 
lines are power law fits with exponent v = 0.281(6) for 
K = 3 and u = 0.269(5) for A'= 4. Error bars are sem. 

ponent: u = 0.281(6) for A' = 3 and v = 0.269(5) for 
AT = 4, where the uncertainties are only fitting errors. 
The two estimate turn out to be compatible within er¬ 
rors, thus suggesting a sort of nniversality for the crit¬ 
ical behavior close to the algorithmic threshold 

Nonetheless a word of caution is needed since the so¬ 
lutions we are nsing as starting points for the whitening 
procedure are atypical solutions (otherwise they would 
likely contain frozen variables and would not flow to 
the all-* configuration under the whitening procedure). 
So, while finding universal critical properties in a dy¬ 
namical process is definitely a good news, how to relate 
it to the behavior of the same process on typical solu¬ 
tions it is not obvious (and indeed for the whitening 
process starting from typical solutions one would ex¬ 
pect the naive mean field exponent u = 1/2, which is 
much larger than the one we are finding). 

Discussion 

We have studied the Backtracking Survey Propaga¬ 
tion (BSP) algorithm for finding solutions in very large 
random AT-SAT problems and provided numerical ev¬ 
idence that it works much better than any previously 
available algorithm. That is, BSP has the largest al¬ 
gorithmic threshold known at present. The main rea¬ 


son for its superiority is the fact that variables can be 
re-assigned at any time during the run, even at the 
very end. In other solving algorithms that may look 
similar, as e.g. survey propagation reinforcement m, 
re-assignment of variables actually takes place mostly 
at the beginning of the run, and this is far less efficient 
in hard problems. Even doing a lot of helpfnl back¬ 
tracking, the BSP running time is still 0{N log N) in 
the worst case, and thanks to this it can be used on 
very large problems with millions of constraints. 

Eor K = 3 the BSP algorithm finds solutions prac¬ 
tically up to the SAT-UNSAT threshold cts, while for 
AT = 4 a tiny gap to the SAT-UNSAT threshold still 
remains, but the algorithmic threshold seems to 
be located beyond the rigidity threshold in the large 
N limit. Beating the rigidity threshold, i.e. finding so¬ 
lutions in a region where the majority of solutions be¬ 
longs to clusters with frozen variables, is hard, but not 
impossible (while going beyond Of should be impossi¬ 
ble). Indeed, even under the assumption that finding 
frozen solutions takes an exponential time in N, very 
smart polynomial time algorithms can look for a so¬ 
lution in the snb-dominant unfrozen clusters [351 ES] ■ 
BSP belongs to this category, as we have shown that 
all solutions found by BSP have no frozen variables. 

One of the main questions we tried to answer with 
our extensive numerical simulations is whether BSP is 
reaching (or approaching closely) the ultimate thresh¬ 
old aa for polynomial time algorithms solving large 
random A'-SAT problems. Under the assumption that 
frozen solutions cannot be found in polynomial time, 
such an algorithmic threshold Oa would coincide with 
the freezing transition at a{ (i.e. when the last nn- 
frozen solntion disappears). Unfortunately for random 
AT-SAT the location of a{ is not known with enough 
precision to allow us to reach a definite answer to this 
question. It would be very interesting to run BSP 
on random hypergraph bicoloring problems, where the 
threshold values are known [431 144] and a very recent 
work has shown that the large deviation function for 
the number of unfrozen clusters can be computed [27] . 

It is worth noticing that the BSP algorithm is easy 
to parallelize, since most of the operations are local 
and do not require any strong centralized control. Ob¬ 
viously the effectiveness of a parallel version of the al¬ 
gorithm would largely depend on the topology of the 
factor graph representing the specific problem: if the 
factor graph is an expander, then splitting the prob¬ 
lem on several cores may require too much inter-core 
bandwidth, but in problems having a natural hierar¬ 
chical structure the parallelization may lead to further 
performance improvements. 

The backtracking introduced in the BSP algorithm 
helps a lot in correcting errors made during the partial 
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assignment of variables and this allows the BSP algo¬ 
rithm to reach solutions at large a values. Clearly we 
pay the price that a too frequent backtracking makes 
the algorithm slower, but it seems worth paying such 
a price to approach the SAT-UNSAT threshold closer 
than any other algorithm. 

A natural direction to improve this class of algo¬ 
rithms would be to used biased marginals focusing on 
solutions which are easier to be reached by the algo¬ 
rithm itself. For example in the region a > aj- the 
measure is concentrated on solutions with frozen vari¬ 
ables, but these can not be really reached by the algo¬ 
rithm. The backtracking thus intervenes and corrects 
the partial assignment until a solution with unfrozen 
variables is found by chance. If the marginals could be 
computed from a new biased measure which is concen¬ 
trated on the unfrozen clusters, this could make the 
algorithm go immediately in the right direction and 
much less backtracking would be hopefully needed. 


Methods 

Survey Inspired Decimation (SID) A detailed 
description of the SID algorithm can be found in 
Refs. [1211131 [32] • The SID algorithm is based on the 
survey propagation (SP) equations derived by the cav¬ 
ity method MM, that can be written in a compact 
way as 



= n ’ 

(1) 


j&da\i 


'^i^a 


(2) 


= 1 “ n “ 'mb^i) , 

(3) 




the set 

of variables in clause a, 

and d+ 


(resp. d“) is the set of clauses containing Xi, excluding 
a itself, satisfied (resp. not satisfied) when the variable 
Xi is assigned to satisfy clause a. 

The interpretation of the SP equations is as follows: 
rha^i represents the fraction of clusters where clause a 
is satisfied solely by variable Xi (that is, xi is frozen by 
clause a), while rrii^a is the fraction of clusters where 
Xi is frozen to an assignment not satisfying clause a. 

The SP equations impose 2KM self-consistency con¬ 
ditions on the 2KM variables {rrii^aji^a^i} living on 
the edges of the factor graph [7], that are solved in 
an iterative way, leading to a message passing algo¬ 
rithm (MPA) |1], where outgoing messages from a fac¬ 
tor graph node (variable or clause) are functions of 
the incoming messages. Once the MPA reaches a fixed 
point that solves the SP equations, the 


number of clusters can be estimated via the complexity 
S = logA4ius = S, + ^(1 - iF,)S, , (4) 

i a 

= log (l- n , Si = log(l- TT+TT") (5) 

3&da 

with Trf = 1 - n (1 - ml^i), (6) 

feeaf 


where Ka is the length of clause a (initially Ka = K) 
and df (resp. d~) is the set of clauses satisfied by 
setting Xi = 1 (resp. Xi = —1). The SP fixed point 
messages also provide information about the fraction 
of clusters where variable Xi is forced to be positive 
{wf ), negative {w~) or not forced at all (1 — wf —w~) 


-1 + — 
1 — ttZ TT • 


(7) 


The SID algorithm then proceed by assigning vari¬ 
ables (decimation step). According to SP equations, 
assigning a variable Xi to its most probable value (i.e., 
setting Xj = 1 if wf > w~ and viceversa), the number 
of clusters gets multiplied by a factor, called bias 


bi = l - min(t(;j^, re- ). 


( 8 ) 


With the aim of decreasing the lesser the number of 
cluster and thus keeping the largest the number of so¬ 
lutions in each decimation step, SID assigns/decimate 
variables with the largest hi values. In order to keep 
the algorithm efficient, at each step of decimation a 
small fraction / of variables is assigned, such that in 
OilogN) steps of decimation a solution can be found. 

After each step of decimation, the SP equations are 
solved again on the subproblem, which is obtained by 
removing satisfied clauses and by reducing clauses con¬ 
taining a false literal (unless a zero-length clause is 
generated, and in that case the algorithm returns a 
failure). The complexity and the biases are updated 
according to the new fixed point messages, and a new 
decimation step is performed. 

The main idea of the SID algorithm is that fixing 
variables which are almost certain to their most prob¬ 
able value, one can reduce the size of the problem with¬ 
out reducing too much the number of solutions. The 
evolution of the complexity S during the SID algo¬ 
rithm can be very informative |35|. Indeed it is found 
that, if S becomes too small or negative, the SID al¬ 
gorithm is likely to fail, either because the iterative 
method for solving the SP equations no longer con¬ 
verges to a fixed point or because a contradiction is 
generated by assigning variables. In these cases the 
SID algorithm returns a failure. On the contrary, if 
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S always remains well positive, the SID algorithm re¬ 
duces so much the problem, that eventually a trivial 
SP fixed point, = 0, is reached. This 

is a strong hint that the remaining subproblem is easy 
and the SID algorithm tries to solve it by WalkSat m- 

A careful analysis of the SID algorithm for random 
3-SAT problems of size N = O(IO^) shows that the al¬ 
gorithmic threshold achievable by SID is a®™ = 4.2525 
135], which is close, but definitely smaller than the 
SAT-UNSAT threshold Os = 4.2667. 

The running time of the SID algorithm experimen¬ 
tally measured is 0{Nlog{N)) [32]. 

Backtracking Survey Propagation (BSP) Will¬ 
ing to improve the SID algorithm to find solutions also 
in the region a®™ < a < Og, one has to change the 
way variables are assigned. The fact the SID algo¬ 
rithm assigns each variable only once is clearly a strong 
limitation, especially in a situation where correlations 
between variables becomes extremely strong and long- 
ranged. In difficult problems it can easily happen that 
one realizes that a variable is taking the wrong value 
only after having assigned some of its neighbours vari¬ 
ables. However the SID algorithm is not able to solve 
this kind of frustrating situations. 

The BSP algorithms [36] tries to solve this kind 
of problematic situations by introducing a new back¬ 
tracking step, where a variable already assigned can 
be released and eventually re-assigned in a future deci¬ 
mation step. It is not difficult to understand when it is 
worth releasing a variable. The bias bi in terms of the 
SP fixed point messages {rh*^^^\a&di arriving in i can 
be computed also for a variable xi already assigned: 
if the bias bi, that was large at the time the variable 
Xi was assigned, gets strongly reduces by the effect of 
assigning other variables, then it is likely that releas¬ 
ing the variable Xi may be beneficial in the search for 
a solution. So both the variables to be fixed in the 
decimation step and the variables to be released in the 
backtracking step are chosen according to their biases 
bi'. the variables to be fixed have the largest biases and 
the variables to be released have the smallest biases. 

The BSP algorithm then proceeds similarly to SID, 
by alternating the iterative solution to the SP equa¬ 
tions and a step of decimation or backtracking on a 
fraction / of variables in order to keep the algorithm 
efficient (in all our numerical experiments we have used 
/ = 10“^). The choice between a decimation or a 
backtracking step is taken according to a stochastic 
rule (unless there are no variables to unset), where 
the parameter r G [0,1) represents the ratio between 
backtracking steps to decimation steps. Obviously for 
r = 0 we recover the SID algorithm, since no back¬ 
tracking step is ever done. Increasing r the algorithm 
becomes slower by a factor 1/(1 — r), because variables 


are reassigned on average 1/(1 — r) times each before 
the BSP algorithm reaches the end, but its complexity 
remains at most 0{N\ogN) in the problem size. 

The BSP algorithm can stop for the same reasons the 
SID algorithm does: either the SP equations can not 
be solved iteratively or the generated subproblem has 
a contradiction. Both cases happen when the complex¬ 
ity S becomes too small or negative. On the contrary if 
the complexity remain always positive the BSP even¬ 
tually generate a subproblem where all SP messages 
are null and on this subproblem WalkSat is called. 

Data Availability Statement 

The numerical codes used in this study and the data 
that support the findings are available from the corre¬ 
sponding author upon request. 
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